Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches
نویسندگان
چکیده
منابع مشابه
Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
BACKGROUND Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively...
متن کاملFault Tolerance for High-Performance Applications Using Structured Parallelism Models
In the last years parallel computing has increasingly exploited the high-level models of structured parallel programming, an example of which are algorithmic skeletons. This trend has been motivated by the properties featuring structured parallelism models, which can be used to derive several (static and dynamic) optimizations at various implementation levels. In this thesis we study the proper...
متن کاملAutomating Middleware Specializations for Fault Tolerance
General-purpose middleware solutions, by definition, cannot readily support domain-specific semantics without significant manual efforts in specializing the middleware. This paper presents GRAFT (GeneRative Aspects for Fault Tolerance), which is a modeldriven, generative, and aspects-based approach to specialize general-purpose middleware with failure handling and recovery semantics imposed by ...
متن کاملFault-Tolerance for High-Performance Multi-Module VLSI Systems Using Micro Rollback
In order to achieve fault tolerance, highly reliable systems often require hardware-supported concurrent error detection for all system components. Checkers are connected in the communication paths from each module to the rest of the system, reducing system performance by requiring either longer clock cycles or additional pipeline stages. The performance penalty of concurrent error detection ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers in Biology and Medicine
سال: 2014
ISSN: 0010-4825
DOI: 10.1016/j.compbiomed.2014.02.005